Advances in Glottal Analysis and its Applications

نویسندگان

  • Thomas Drugman
  • Thierry Dutoit
  • Baris Bozkurt
چکیده

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be particularly promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis so as to popularize it in speech processing. First, new techniques for glottal excitation estimation and modeling are proposed and shown to outperform other state-of-the-art approaches on large corpora of real speech. Moreover, proposed methods are integrated within various speech processing applications: speech synthesis, voice pathology detection, speaker recognition and expressive speech analysis. They are shown to lead to a substantial improvement when compared to other existing techniques. More specifically, the present thesis covers three separate but interconnected parts. In the first part, new algorithms for robust pitch tracking and for automatic determination of glottal closure instants are developed. This step is necessary as accurate glottal analysis requires to process pitch-synchronous speech frames. In the second part, a new non-parametric method based on Complex Cepstrum is proposed for glottal flow estimation. In addition, a way to achieve this decomposition asynchronously is investigated. A comprehensive comparative study of glottal flow estimation approaches is also given. Relying on this expertise, the usefulness of glottal information for voice pathology detection and expressive speech analysis is explored. In the third part, a new excitation modeling called Deterministic plus Stochastic Model of the residual signal is proposed. This model is applied to speech synthesis where it is shown to enhance the naturalness and quality of the delivered voice. Finally, glottal signatures derived from this model are observed to lead to an increase of identification rates for speaker recognition purpose.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Steady Flow Through Modeled Glottal Constriction

The airflow in the modeled glottal constriction was simulated by the solutions of the Navier-Stokes equations for laminar flow, and the corresponding Reynolds equations for turbulent flow in generalized, nonorthogonal coordinates using a numerical method. A two-dimensional model of laryngeal flow is considered and aerodynamic properties are calculated for both laminar and turbulent steady flows...

متن کامل

Glottal closure and opening detection for flexible parametric voice coding

The knowledge of glottal closure and opening instants (GCI/GOI) is useful for many speech analysis applications. A Pitchsynchronous waveform encoding of voice is one such application. In this paper, a dynamic programming is employed to solve for the global close/open phase segmentation based on the polynomial parametric waveform of the derivative glottal waveform and its quasi-periodicity. Not ...

متن کامل

Glottal source processing: From analysis to applications

The great majority of current voice technology applications rely on acoustic features, such as the widely used MFCC or LP parameters, which characterize the vocal tract response. Nonetheless, the major source of excitation, namely the glottal flow, is expected to convey useful complementary information. The glottal flow is the airflow passing through the vocal folds at the glottis. Unfortunatel...

متن کامل

Improving Glottal Waveform Rank-based Glottal Qua

Information on the glottal waveform is an important part of many speech applications. However, glottal waveform estimation remains one of the more inexact sciences of speech processing. The work presented here describes an enhancement to a recently presented algorithm by a new technique involving Rank-Based Glottal Quality Assessment (RB-GQA). The basic premise is to investigate potential measu...

متن کامل

Epoch-based analysis of speech signals

Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011